Skip to content

Conversation

@HuaHuaY
Copy link
Contributor

@HuaHuaY HuaHuaY commented Jan 15, 2026

Rationale for this change

Support customizing more Zstd parameters to allow users to optimize the performance of parquet page compression/decompression.

What changes are included in this PR?

  1. Add a new class ZstdCodecOptions inherited from CodecOptions.
  2. Create ZSTD_CCtx/ZSTD_DCtx explicitly in ZSTDCodec.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes. The new class ZstdCodecOptions is added.

@HuaHuaY HuaHuaY requested a review from wgtmac as a code owner January 15, 2026 12:52
@github-actions
Copy link

⚠️ GitHub issue #48864 has been automatically assigned in GitHub to PR creator.

@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Jan 15, 2026

@pitrou @wgtmac @mapleFU Please take a look. This PR only uses context and does not reuse it. I create a new context every time I call Compress and Decompress methods. I guess this is the same behavior as before.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @HuaHuaY . This can be a useful improvement, I've made a bunch of comments below.

@github-actions github-actions bot added awaiting committer review Awaiting committer review awaiting review Awaiting review and removed awaiting review Awaiting review labels Jan 15, 2026
@github-actions github-actions bot added awaiting review Awaiting review and removed awaiting review Awaiting review awaiting committer review Awaiting committer review labels Jan 15, 2026
@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Jan 15, 2026

@pitrou Thank you for your detailed review! I've fixed these comments. Please take a look again.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good! Just a few nits.

{5, GZipFormat::DEFLATE, -12, false},
{-992, GZipFormat::GZIP, 15, false}};

template <std::derived_from<arrow::util::CodecOptions> T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, a concept 😀

ARROW_ASSIGN_OR_RAISE(auto cctx, CreateCCtx());
size_t ret = ZSTD_compressCCtx(cctx.get(), output_buffer,
static_cast<size_t>(output_buffer_len), input,
static_cast<size_t>(input_len), compression_level_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny that the compression level is passed a second time here, after we already gave it to ZSTD_CCtx_setParameter. I suppose it doesn't matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context will also be passed to ZSTDCompressor and it needs compression_level_. We can move auto ret = ZSTD_CCtx_setParameter(cctx.get(), ZSTD_c_compressionLevel, compression_level_) to MakeCompressor though I think the readability will decrease a bit.

Copy link
Contributor Author

@HuaHuaY HuaHuaY Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I check the Zstd's document again, I think there is a mistake here. We need to use ZSTD_compress2 instead of ZSTD_compressCCtx because ZSTD_compressCCtx will reset other parameters. Thank you for your good attention!

/*! ZSTD_compressCCtx() :
 *  Same as ZSTD_compress(), using an explicit ZSTD_CCtx.
 *  Important : in order to mirror `ZSTD_compress()` behavior,
 *  this function compresses at the requested compression level,
 *  __ignoring any other advanced parameter__ .
 *  If any advanced parameter was set using the advanced API,
 *  they will all be reset. Only @compressionLevel remains.
 */
ZSTDLIB_API size_t ZSTD_compressCCtx(ZSTD_CCtx* cctx,
                                     void* dst, size_t dstCapacity,
                               const void* src, size_t srcSize,
                                     int compressionLevel);

/*! ZSTD_compress2() :
 *  Behave the same as ZSTD_compressCCtx(), but compression parameters are set using the advanced API.
 *  (note that this entry point doesn't even expose a compression level parameter).
 *  ZSTD_compress2() always starts a new frame.
 *  Should cctx hold data from a previously unfinished frame, everything about it is forgotten.
 *  - Compression parameters are pushed into CCtx before starting compression, using ZSTD_CCtx_set*()
 *  - The function is always blocking, returns when compression is completed.
 *  NOTE: Providing `dstCapacity >= ZSTD_compressBound(srcSize)` guarantees that zstd will have
 *        enough space to successfully compress the data, though it is possible it fails for other reasons.
 * @return : compressed size written into `dst` (<= `dstCapacity),
 *           or an error code if it fails (which can be tested using ZSTD_isError()).
 */
ZSTDLIB_API size_t ZSTD_compress2( ZSTD_CCtx* cctx,
                                   void* dst, size_t dstCapacity,
                             const void* src, size_t srcSize);

Copy link
Contributor Author

@HuaHuaY HuaHuaY Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new test TEST(TestCodecMisc, ZstdLargerWindowLog) to ensure the other parameters work effectively.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 15, 2026
@HuaHuaY HuaHuaY force-pushed the fix_issue_48864 branch 2 times, most recently from dfbca3b to 84dd650 Compare January 15, 2026 17:36
@HuaHuaY HuaHuaY force-pushed the fix_issue_48864 branch 2 times, most recently from 5d26717 to 698aefd Compare January 16, 2026 07:58
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (I haven't gone through the test). Just left some minor comments.

}

TEST(TestWriterProperties, SetCodecOptions) {
constexpr int ZSTD_c_windowLog = 101;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use kZstdCWindowLog to be consistent with this code base?

Copy link
Contributor Author

@HuaHuaY HuaHuaY Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep the name style of zstd. Because I think it is more readable, although it is inconsistent with our coding style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can just keep this in the tests.

@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Jan 19, 2026

@pitrou @mapleFU Do you have any suggestions? Please take a look again.

}

TEST(TestWriterProperties, SetCodecOptions) {
constexpr int ZSTD_c_windowLog = 101;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we can just keep this in the tests.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General LGTM

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thank you @HuaHuaY . I'll rebase for CI.

@pitrou
Copy link
Member

pitrou commented Jan 21, 2026

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 2af1945

Submitted crossbow builds: ursacomputing/crossbow @ actions-512d34afcc

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou
Copy link
Member

pitrou commented Jan 21, 2026

@HuaHuaY The new tests need to be skipped if zstd is not available:
https://github.com/ursacomputing/crossbow/actions/runs/21206351575/job/61003673870#step:6:8214


[ RUN      ] TestCodecMisc.SpecifyCodecOptionsZstd
/arrow/cpp/src/arrow/util/compression_test.cc:458: Skipped
Support for this codec hasn't been built

[  SKIPPED ] TestCodecMisc.SpecifyCodecOptionsZstd (0 ms)
[ RUN      ] TestCodecMisc.ZstdLargerWindowLog
/arrow/cpp/src/arrow/util/compression_test.cc:549: Failure
Failed
'_error_or_value49.status()' failed with NotImplemented: Support for codec 'zstd' not built

[  FAILED  ] TestCodecMisc.ZstdLargerWindowLog (219 ms)
[ RUN      ] TestCodecMisc.ZstdStreamLargerWindowLog
/arrow/cpp/src/arrow/util/compression_test.cc:567: Failure
Failed
'_error_or_value51.status()' failed with NotImplemented: Support for codec 'zstd' not built

[  FAILED  ] TestCodecMisc.ZstdStreamLargerWindowLog (283 ms)

@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Jan 21, 2026

@HuaHuaY The new tests need to be skipped if zstd is not available: https://github.com/ursacomputing/crossbow/actions/runs/21206351575/job/61003673870#step:6:8214


[ RUN      ] TestCodecMisc.SpecifyCodecOptionsZstd
/arrow/cpp/src/arrow/util/compression_test.cc:458: Skipped
Support for this codec hasn't been built

[  SKIPPED ] TestCodecMisc.SpecifyCodecOptionsZstd (0 ms)
[ RUN      ] TestCodecMisc.ZstdLargerWindowLog
/arrow/cpp/src/arrow/util/compression_test.cc:549: Failure
Failed
'_error_or_value49.status()' failed with NotImplemented: Support for codec 'zstd' not built

[  FAILED  ] TestCodecMisc.ZstdLargerWindowLog (219 ms)
[ RUN      ] TestCodecMisc.ZstdStreamLargerWindowLog
/arrow/cpp/src/arrow/util/compression_test.cc:567: Failure
Failed
'_error_or_value51.status()' failed with NotImplemented: Support for codec 'zstd' not built

[  FAILED  ] TestCodecMisc.ZstdStreamLargerWindowLog (283 ms)

My mistake. I have added an if statement to skip the tests. @pitrou

@pitrou
Copy link
Member

pitrou commented Jan 21, 2026

@github-actions crossbow submit cppminimal*

@github-actions
Copy link

Revision: a6b30fd

Submitted crossbow builds: ursacomputing/crossbow @ actions-3fe0b907c4

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions

@HuaHuaY
Copy link
Contributor Author

HuaHuaY commented Jan 21, 2026

Sorry that I have refactored a little code. Could you help me take a look and approve this PR again? @pitrou

@pitrou
Copy link
Member

pitrou commented Jan 21, 2026

@github-actions crossbow submit cppminimal*

@github-actions
Copy link

Revision: d5c8568

Submitted crossbow builds: ursacomputing/crossbow @ actions-c4d50b6264

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions

@pitrou pitrou merged commit 3ed9169 into apache:main Jan 21, 2026
52 checks passed
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Jan 21, 2026
@pitrou
Copy link
Member

pitrou commented Jan 21, 2026

Thanks again @HuaHuaY !

@HuaHuaY HuaHuaY deleted the fix_issue_48864 branch January 21, 2026 14:16
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 3ed9169.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants